Skip to content

Conversation

@rjzamora
Copy link
Member

@rjzamora rjzamora commented Dec 3, 2025

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@rjzamora rjzamora self-assigned this Dec 3, 2025
@rjzamora rjzamora requested a review from a team as a code owner December 3, 2025 15:43
@rjzamora rjzamora added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function labels Dec 3, 2025
@rjzamora rjzamora requested review from mroeschke and vyasr December 3, 2025 15:43
@rjzamora rjzamora added the non-breaking Non-breaking change label Dec 3, 2025
@github-actions github-actions bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels Dec 3, 2025
@GPUtester GPUtester moved this to In Progress in cuDF Python Dec 3, 2025
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes to this file are directly reverting #20724

Comment on lines -159 to +172
small_df = _concat(
*await get_small_table(context, small_child, small_ch),
context=ir_context,
)
if context.comm().nranks > 1 and not small_duplicated:
small_dfs, small_size = await get_small_table(context, small_child, small_ch)
need_allgather = context.comm().nranks > 1 and not small_duplicated
if (
ir.options[0] != "Inner" or small_size < target_partition_size
) and not need_allgather:
# Pre-concat for non-inner joins, otherwise
# we need a local shuffle, and face additional
# memory pressure anyway.
small_dfs = [_concat(*small_dfs, context=ir_context)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before #20724, we were only pre-concatenating for non-inner joins (to avoid the extra local shuffle).

After #20724, we were always pre-concatenating.

Now, we pre-concatenate when it makes sense:

  • To avoid the local shuffle (e.g. non-inner joins)
  • We have small data
  • We aren't already concatenating via an allgather

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about ordering.

@rjzamora rjzamora added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Dec 4, 2025
@rjzamora
Copy link
Member Author

rjzamora commented Dec 4, 2025

/merge

@rapids-bot rapids-bot bot merged commit 052e909 into rapidsai:main Dec 4, 2025
138 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Dec 4, 2025
@rjzamora rjzamora deleted the revert-bcast-join-simplify branch December 4, 2025 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to Merge Testing and reviews complete, ready to merge cudf-polars Issues specific to cudf-polars improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants